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1.  INTRODUCTION 

Rule-based  expert  systems  have  moved  from  a research  activity  in  a small  number  of 
academic  computer  science  departments  to  a growing  commercial  activity.  This  transition 
clearly  indicates  that  the  structure  of  a complex  computer  program  enforced  by  a rule-based 
system  (namely,  the  clear  separation  of  the  decision-making  process,  the  inference  engine. 
from  the  data  on  which  the  decisions  are  based,  the  rule  base)  is  a useful  step  in  the 
evolution  of  programming  strategies.  At  the  same  time  there  has  been  a growing 

recognition  that  in  most  decision-making  situations  the  data  (namely,  the  rule  base  and  the 
initial  evidence  used  to  start  the  decision-making  process)  are  not  known  with  certainty  and 
consequently  the  inference  procedures  used  in  traditional  rule-based  systems  are 
inappropriate.  Over  the  last  decade  a number  of  inference  procedures  which  use  various 
numerical  representations  of  uncertainty  have  been  developed  for  use  in  rule-based  systems. 
However,  for  a variety  of  reasons  (including  the  fact  that  there  is  little  logical  basis  for  the 
representations)  none  of  then  has  been  widely  successful. 

In  this  paper  we  will  describe  the  current  state  of  an  ongoing  research  project  which  is 

t 

attempting  to  use  probability  as  the  mechanism  for  representing  uncertainty  in  a rule-based 
system.  „ A previous  report  was  given  in  Eddy  and  Pei  (1986).  We  have  been  constrained 
in  our  development  of  a probability-based  expert  system  by  a number  of  external 
considerations,  the  most  important  of  which  we  delineate  here. 
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The  single  most  important  constraint  is  that  we  are  doing  our  development  in  the  context 
of  an  existing  rule-based  expert  system  and  as  such  we  are  constrained  to  limit  the 
modifications  we  might  wish  to  make  to  the  system.  In  particular,  we  wish  to  limit  our 
modifications  to  the  inference  engine  only.  This  is  not  an  overly  serious  constraint  and  it 
enforces  a certain  locality1  on  the  nature  of  the  possible  computations.  Exactly  this  locality 
o t computation  will  be  required  if  the  system  is  ever  to  be  scaled  up  to  a rule-base 


2 


containing  thousands  or  millions  of  rules.  Barnett  11981)  has  induced  the  same  kind  of 
locality  of  computation  in  a system  very  similar  to  ours  but  at  the  cost  of  assuming 
unrealistic  independence  in  'trious  parts  of  the  rule  base:  we  discuss  Barnett  s work  further 
in  Section  5. 

A second  important  constraint  is  that  any  numerical  expressions  of  uncertainty  about  data 
are  themselves  quite  uncertain,  in  practice,  and  as  such  we  wish  to  allow  for  the  expression 
of  uncertainty  about  the  uncertainties.  There  are  a number  of  possible  ways  to  do  this:  we 
have  chosen  what  appears  to  us  the  simplest  way  to  address  this  constraint.  Precisely,  we 
are  using  belief  functions  (Shafer.  1976)  to  represent  sets  of  probability  distributions.  VVe 
discuss  some  of  the  details  of  belief  functions  in  Section  4. 

There  are  at  least  two  parties  involved  in  the  development  and  use  of  a rule-based  expert 
system:  the  expert,  who  expresses  the  rules,  and  the  user,  who  causes  the  system  to  run 
by  supplying  it  with  some  initial  external  evidence.  An  early  planned  use  of  the  system  we 
are  developing  was  for  game-playing  to  evaluate  possible  strategies.  Initially  we  felt  that  it 
was  important  for  the  two  players,  the  expert  and  the  user,  to  be  able  to  interchange  roles 
without  affecting  the  results.  This  turns  out  to  be  a quite  complex  constraint:  a simplified 
version  of  this  constraint  requires  that  the  system  per  from  properlv  li.e..  get  the  'right" 
answer)  if  the  expert  and  the  user  are  one  and  the  same  individual. 

The  remainder  of  this  paper  is  organized  as  follows.  In  the  next  section  we  give  a very 
bnef  introduction  to  the  details  of  a rule-based  expert  system.  In  Section  3 we  describe 
what,  to  us.  are  the  most  natural  methods  for  incorporating  uncertainty  into  a rule-based 
system.  In  Section  4 we  provide  a few  of  the  formal  details  of  belief  functions  and 
describe  a few  of  their  properties.  In  Section  5 we  discuss  various  possible  approximation 
techniques  which  will  speed  up  the  computations.  In  Sections  6 and  7 we  provide  detailed 
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properties  of  the  approximations  we  have  developed.  The  essence  of  the  approximations  is 
to  force  the  belief  function  to  have  a simpler  form:  an  extreme  form  of  this  simplification 
occurs  when  the  belief  function  represents  a unique  probability  distribution.  In  Section  8 
we  briefly  describe  some  of  the  features  of  our  implementation  of  this  theory  in  a LISP 
computer  program. 


Si 


2.  RULE-BASED  EXPERT  SYSTEMS 

A rule-based  expert  system  (or  production  systeml  consists  of  a collection  of  production 
rules  together  with  a system  for  linking  or  "chaining"  the  rules  to  simulate  a human 
expert's  reasoning  process.  A production  rule  (or.  simply,  a rulel  is  a statement  of  the 
form  "If  A then  B."  where  A and  B are  logical  propositions. 


The  mechanism  used  for  chaining  rules  is  generally  one  of  two  kinds:  either  forward 

chaining  or  backward  chaining.  In  the  forward  chaining  scheme  the  user  of  the  system 
supplies  some  evidence,  generally  of  the  form  "A  is  true,"  and  the  system  then  uses  this 
evidence  together  with  the  rules  to  reason  towards  conclusions  or  goals.  Forward  chaining 
is  generally  described  as  causal  or  deductive  reasoning.  In  the  backward  chaining  scheme 
the  system  attempts  to  satisfy  its  goals  by  finding  rules  which,  if  true,  would  imply  those 
goals.  It  repeats  this  process  until  it  is  compelled,  by  the  lack  of  any  rules  implying  its 
current  goals,  to  ask  the  user  if  a particular  one  or  more  of  those  goals  (the  antecedents  of 
certain  rules)  are  true.  If  the  user  accedes  this  is  deemed  to  be  evidence  that  the  rule  is 
true.  Backward  chaining  is  generally  referred  to  as  diagnostic  reasoning.  One  crucial 
computational  problem  in  either  form  of  reasoning  is  how  to  discover  rules  with  given 

antecedents  (forward  chaining)  or  with  given  consequents  (backward  chaining)  in  the  rule 
base.  Currently  the  only  general  strategy  is  to  search  over  the  entire  rule  base.  Some 

savings  can  be  made  by  "remembering"  the  results  of  previous  searches  so  they  can  be 

"looked  up"  in  a table. 
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3.  PROBABILITIES  AND  RULES 

Thor*  is  currently  no  generally  accepted  method  for  incorporating  uncertainty  into  a rule- 
based  expert  system.  One  method  which  seems  appealing  at  first  glance  is  to  treat  the 
user's  probabilities  on  the  evidence  as  a prior  opinion  and  the  expert's  probabilities  on  the 
rules  as  a likelihood  and  simply  use  Bayes  rule.  In  this  method,  we  would  expect  the 
expert  constructing  the  system  to  have  joint  probability  distribution  on  the  assignment  of 
truth  values  to  the  propositions  which  are  consequents  of  all  the  rules  in  the  system.  This 
joint  probability  distribution  would  be  conditional  on  the  assignment  of  truth  values  to  those 
propositions  in  the  system  which  are  antecedents  of  some  rule  and  not  consequents  of  any 
rule  Ithe  evidence  nodesi.  Also,  we  would  expect  the  user  of  the  system  to  have  a joint 
probability  distribution  on  the  assignment  of  truth  values  to  these  evidence  nodes.  There 
are  a number  of  obvious  difficulties  with  this  scheme: 


1.  It  is  unreasonable  to  expect  anyone  to  express  a joint  probability  distribution  on 
the  assignment  of  truth  values  to  a large  collection  of  propositions  for  two 


a.  The  size  of  the  collection  of  propositions: 

b.  The  inherent  uncertainty  in  the  expressed  probability  distribution. 

2.  The  amount  of  calculation  required  is  overwhelming,  being  exponential  in  the 
number  of  propositions  in  the  system. 

3.  The  symmetry  constraint  mentioned  in  the  introductory  section  is  obviously  not 
satisfied. 

There  are  also  a few  subtler  problems: 

1.  The  pooling  of  expert  and  user  opinion  via  Bayes  rule  would  appear  to  be  ( 

inappropriate.  More  precisely,  use  of  Bayes  rule  to  pool  the  probabilitv 

distributions  of  two  individuals  has  no  logical  basis  unless  one  of  the  individuals  '■ 

declares  the  probability  distribution  of  the  other  individual  to  be  his  own. 

T 

2.  Both  the  expert  and  the  user  can  reasonably  be  expected  to  have  a joint  ' 

probabilitv  distribution  on  the  assignment  of  truth  values  to  ail  the  propositions 

in  the  system.  The  use  of  lower-dimensionaJ  marginal  and  conditional  probability 
distributions  from  these  two  higer-dimensional  joint  distributions  appears  to 
discard  potentially  useful  information. 
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A second  method  for  incorporating  uncertainty  into  a rule-based  expert  system  pools  the 
opinions  of  the  expert  and  the  user.  We  would  expect  the  expert  constructing  the  system 
to  have  a joint  probability  distribution  on  the  assignment  of  truth  values  to  all  the 
propositions  in  the  system  and  we  would  expect  the  user  to  also  have  such  a joint 
probability  distribution.  This  second  method  can.  by  the  appropriate  choice  of  a pooling 
rule,  satisfy  the  symmetry  constraint  mentioned  above. 

If  it  is  possible  to  decompose  each  joint  probability  distribution  so  that  a piece  of  the 
decomposition  can  be  attached  to  a small  number  of  propositions,  and  if  this  piece  can  be 
combined  with  another  piece  so  that  the  entire  joint  distribution  can  be  recovered  then  the 
difficulty  of  assigning  a joint  probability  distribution  on  the  assignment  of  truth  values  to  a 
large  collection  of  propositions  may  be  overcome.  One  such  decomposition  is  the  conditional 
one:  it  would  be  desireable  to  have  a decomposition  that  is  symmetric  so  the  order  of 
composition  is  unimportant.  Although  Spiegeihalter  (1986)  has  proposed  a mechanism  for 
allowing  the  conditional  decomposition  to  be  symmetric. 
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We  also  allow  the  expression  of  uncertainty  about  probabilities  by  use  of  a belief  functions 
as  a lower  bound  on  the  probability.  This  will  allow  us  to  alleviate  the  first  of  the  three 
obvious  difficulties  mentioned  above.  It  does  not  seem  possible  to  significantly  reduce  the 
computational  requirements  mentioned  in  the  third  difficulty:  however,  in  Sections  5.  6.  and 
7 we  discuss  an  approximation  which  provides  some  reduction  in  the  computational  burden 
(see  Eddy  and  Pei.  1986.  for  an  alternative  scheme!. 

4.  BELIEF  FUNCTIONS 

I 

Following  Snafer  (1976),  let  9 be  a set  of  mutually  exclusive  and  exhaustive  propositions. 

r\  A 

Let  2'*  be  the  set  of  ail  subsets  of  9:  elements  of  2 can  be  interpreted  as  general 
propositions  in  the  problem  domain.  A basic  probability  assignment  is  a function  mi  I from 
2®  into  jO.  1]  which  satisfies  ( 
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Oi  ml'IS  1. 
mi^1  = 0. 

and 

^ S contained  in  0 ^S)  — 1. 

There  is  a one-to-one  correspondence  between  this  basic  probability  assignment  and  the 
belief  function.  Bell ),  and  plausability  function.  PH  ),  given  by 

BeHS)  = IT  contaiIMKl  in  g 

"*si  = I T contained  in  s (-nls  T|Beim. 

and 

P1IS)  = 1 - BellS). 

It  is  apparent  that  BellA)  £ Pr(A)  £ P!(A)  where  Pr{A)  is  the  probability  of  A.  When  BellA) 
equals  PHAI  for  every  element  in  2®,  the  values  correspond  to  probabilities.  This  implies 
that  the  function  m takes  non-zero  values  on  the  singletons  only. 

There  exist  convex  sets  of  probabilities,  expressed  only  as  a set  of  intervals  of  probability, 
which  cannot  be  represented  by  belief  functions.  For  example,  suppose  that  the  four  events 
denoted  by  {1.  2.  3.  4)  have  the  probabilities  given  by 


v 

S. 

v 

*, 


Pl  = II  - 2 ql/2 
p,  = II  - 2 ql/2 

P3  = q 

P4  = q 

where  q ranges  over  the  values  0 £ q £ 1/4.  Table  4-1  gives  the  values  of  the  probability 
as  a function  of  q,  the  belief  Bel.  the  plausability  PI.  and  the  implied  basic  probability 
number  m for  all  the  events  in  the  algebra  generated  by  these  four  events.  The  important 
point  to  notice  is  that  mt)  is  not  positive  for  all  events. 
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More  generally,  it  can  be  shown  that  if  a set  of  probability  intervals  are  given  for  the 
elements  of  a partition  as 

0 i Lj  i p,  S Uj  S 1,  i=l n 

then  for  there  to  exist  a corresponding  belief  function  Bell  I.  it  is  necessary  that  both 

]E"_,  L,  + Uj  • L}  ^ 1.  for  all  j 
and 


aLjli  Uj  + Lj  - Uj  2 1.  for  all  j. 

This  provides  a quick  and  dirty  test  whether  or  not  an  expressed  set  of  probability 
intervals  are  in  fact  representable  by  a belief  function.  Unfortunately,  the  sufficient 

conditions  are  considerably  more  complex. 

One  particularly  nice  feature  of  the  theory  of  belief  functions  is  that  it  distinguishes 
between  indifference  and  ignorance.  Complete  ignorance  is  represented  by  the  vacuous 
belief  function  that  assigns  basic  probability  one  to  the  set  8 and  zero  to  every  subset. 
Complete  indifference  assigns  an  equal  amount  to  all  singleton  propositions  and  zero  to 
every  other  subset:  this  is  precisely  a uniform  probability  distribution  on  the  elements  of 
the  partition.  Any  degree  of  ignorance  can  be  expressed  quite  naturally  between  the  two 
extremes  of  complete  ignorance  and  a well-defined  probability  distribution. 

The  basic  theory  of  belief  functions  requires  that  the  frame  of  discernment  be  composed 
of  mutually  exclusive  propositions.  This  means  that  only  one  proposition  at  a time  can  be 
true.  In  an  expert  system  this  condition  is  explicitly  not  satisfied:  consequently,  direct 
application  of  the  the  theory  is  impossible.  We  overcome  this  problem  as  follows.  Let  Q 
be  a set  of  mutually  supporting  propositions:  that  is.  suppose  that 

« = iPr  p2 p„i- 
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By  mutually  supporting  we  mean  that  any  assignment  of  truth  values  to  the  propositions  is 
possible.  Let  2^  be  a list  of  the  possible  assignments  of  truth  values  to  the  elements  of 
ft.  If  we  now  let  2^  be  the  frame  of  discernment  then  it  is  possible  to  use  the  theory  of 
belief  functions. 

As  originally  proposed  this  theory  used  a rule  of  combination  (now  widely  known  as 
Dempster's  Rule  of  Combination I for  two  basic  probability  assignments,  m,  and  m,,  of  the 
form 

m,  ® m2  IAI  = K Ig  jnteisect  T „ A m^Sl  m,|T)  for  A * f 14.11 

where  the  normalization  constant  K is  chosen  so  the  combined  basic  probabilities  add  to 

one.  We  have  found  this  rule  to  be  unsatisfactory  and  are  currently  exploring  some 
alternative  possibilities.  Consider  repeated  application  of  this  rule  of  combination,  viz.. 

mj  ® m2  O . . . ® mn.  (4.2) 

What  are  the  possible  limits  as  n increases?  It  is  fairly  easy  to  see  that  both  the  uniform 

probability  distribution  and  any  belief  function  with  a single  focal  element  (including  the 

vacuous  belief  function!  are  solutions  and  there  are  no  others.  It  is  unreasonable  to  expect 
that  any  rule  of  combination,  when  iterated  in  this  manner,  would  yield  every  belief 

functions  as  a possible  limit:  on  the  other  hand  the  observed  behavior  of  the  combination 
rule  give  in  Equation  4.1  appears  too  restricted. 

Tvpically.  two  different  belief  functions  will  not  be  defined  over  the  same  frame  of 
discernment  and  a combination  rule  such  as  Equation  4.1  can  not  be  directly  applied.  One 
frame  is  compatible  with  another  if  it  can  be  obtained  from  it  by  splitting  some  of  its 
possibilities  into  finer  possibilities.  The  frame  of  the  finer  analysis  is  called  a refinement  of 
the  original;  the  former  is  called  a coarsening  of  the  latter.  Before  application  of  a rule  of 
combination  it  may  be  necessary  to  refine  one  or  both  of  the  frames  in  order  to  obtain  a 
common  frame  of  discernment. 


. J 


;S>;2-y-  v -v’.;Nv  . •;  > v-  v >y>v-Vv  ;>1 


D 


tTTJT*  vrwy 


WTTJ *J -.H  ->  ->■  - * «y ■">•  «_V'^jf»jr*jr"jr'jrwjr-jr-jr-jr^jrrXT.  U'.u-^  1T-J  -jrr.x-  X - x- xr  x 


n 


(r 


1 

1 


% 


j 

« 

* 


10 

5.  REDUCING  THE  COMPUTATIONAL  COMPLEXITY 

Tuere  are  considerable  computational  difficulties  in  using  this  theory  An  initial 
assignment  of  2°  basic  probability  assignments  must  be  made,  where  n represents  the 
number  of  propositions  in  the  frame  of  discernment  9 The  required  number  of  evaluations 
in  using  any  combination  rule  increases  exponentially  as  more  propositions  are  included  It 
seems  reasonable  that  intelligent  exploitation  of  some  structure  could  result  in  computational 
savings 

One  way  to  reduce  computational  complexity  is  to  assume  that  each  piece  of  evidence 
either  confirms  or  denies  a single  proposition  rather  than  a disjunction.  This  is  the 
approach  that  Barnett  takes  in  his  work  (Barnett.  1981).  While  this  will  reduce  the  number 
o»  calculations  from  exponential  linear,  it  also  means  that  the  frame  must  be  broken  into 
independent  partitions.  This  is  a very  strong  assumption  and  not  likely  to  be  satisfied  in 

practice.  Here,  we  are  interested  in  retaining  the  more  natural  possibility  of  dependence 

among  the  propositions  in  the  system. 

Another  possible  approach  would  discount,  at  an  early  stage  of  the  calculations.  setc  with 
zero,  or  very  small,  basic  probability  assignments.  Yet  another  approach  is  to  ignore  those 
sets  with  a cardinality  higher  than  a predetermined  threshold.  This  is  the  approach  we 
take  here.  It  is  possible  to  reduce  the  computational  problem  from  one  of  exponential  time 
to  one  of  polynomial  time,  and  the  degree  of  the  polynomial  can  be  set  in  advance  by 
suitable  choice  of  the  threshold. 

A belief  function  provides  both  a lower  bound  and  an  upper  bound  for  the  probability 

The  narrower  the  range  of  this  interval  the  more  definite  the  knowledge  about  the 

probability  It  seems  reasonable  to  require  that  any  approximation  to  an  m-function  should 
preserve  the  properties  of  an  m-function.  This  produces  one  of  the  following  three 


possibilities: 
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1 a less  definite  assignment  of  uncertainty  la  wider  interval): 

2.  a more  definite  assignment  of  uncertainty  la  narrower  interval). 

3.  no  change. 

Suppose  the  cardinality  of  0 is  n Ithat  is  © contains  n propositions).  The  approximations 
to  be  used  involve  neglecting  m-function  values  attached  to  elements  of  2 with  cardinality 
greater  than  a threshold  value  k.  To  restore  the  approximation  to  an  ni-function  requires 
some  form  of  renormalization.  To  produce  the  first  case  (above)  it  is  proposed  that  the 
m-function  is  restored  by  moving  all  the  ignored  basic  probability  mass  to  the  element  0 
To  produce  the  second  effect  the  excess  basic  probability  mass  should  be  added  to  the 
elements  of  2 with  cardinality  less  than  (or  equal  to)  the  threshold  value  k in  proportion 
to  their  original  values. 

6.  AN  OUTER  APPROXIMATION 

Denoting  the  approximations  by  m*|),  Bel*!  ) and  Pl*|  ) and  dealing  with  the  conservative 
approach  first,  the  desired  results  are  as  follows: 

Bel'(A)  £ Bel(A)  , A contained  in  0 

Pi’  I A)  £ Pl(A)  . A contained  in  0.  16.1) 

The  remaining  requirement  is  that  the  function  m*(  ) does  not  violate  the  rules  for  an 
^function,  no  matter  what  the  value  of  k.  The  three  requirements  that  a function  must 
satisfy  to  be  an  ^function  are  simply 
nUf I = 0 
0 £ /n(AI  £ 1 

and 

^"A  contained  in  ©^A)  — 1 


J 


We  define  m*  to  be  an  order  k outer  approximation  to  m as  follows: 


m \f)  =*  0 


m IAI  = mlAI 


m IAI  ™ 0 


if  Ul  i k and  A contained  in  9 
if  |a|  > k and  A * 9 


= 1 ‘ ZA  contained  in  9 'n‘IAI 

16.21 

— **A  contained  in  9.  Al  > k ^ A* 

where  k is  the  threshold  cardinality  and  I I represents  the  the  number  of  elements  in  the 
set.  The  first  requirement  for  m*l  ) to  be  an  m-function  is  trivially  satisfied,  and  the 

second  requirement  is  clearly  satisfied  for  ail  the  above  parts  (the  latter  simpiv  because  the 
sum  must  be  leas  than  or  equal  to  the  sum  of  all  the  mlAI.  which  is  one).  All  that 

remains  is  to  verify  the  third  condition  for  an  m- function. 

^•A  contained  in  9 m ,AI  ” m ^ 1 + ^A  contained  in  9.  |A|  £ k ',*A* 

+ ^A  contained  in  9.  IAI  > k.  A ae  f + in  101 

— ® 'T  ^A  contained  in  9.  IAI  £ k ^A* 

+ ® + ^A  contained  in  9.  IAI  > k nl*A* 

~ ^A  contained  in  9 ^A^ 


The  range  of  possible  values  for  k is  given  by 


0 £ k £ n-1. 


Tne  value  k=0  always  yields  tho  vacuous  probability  assignment  and  the  value  k=n-l 
always  yields  the  original  probability  assignment.  It  is  clear  that  the  smaller  the  value  of 
k the  more  information  is  being  neglected  and  the  approximation  becomes  more  vague  (the 
interval  widens).  The  higher  the  value  of  k the  less  information  is  being  neglected  so  the 
approximation  should  be  closer  to  the  original  specification.  Clearly  there  is  also  a 
possibility  that  the  new  ^'-function  will  not  be  different  than  the  original  /ihfunction.  This 


rVvVv>^!v^;. 


W «/•/“/• 
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can  happen  when  for  a specific  value  of  k.  ail  the  elements  of  2 with  greater  cardinality 
have  m-function  values  of  zero. 


It's  now  necessary  to  prove  the  assertions  made  in  Equation  6.1.  First  consider  the 
belief  Bel’l  ).  It's  easier  to  carry  out  the  proof  in  four  parts  corresponding  to  the  Equation 
6.2.  Clearly  Bel*lfl  = 0.  hence  the  First  part  is  satisfied.  The  second  part  is  satisfied  as 
Bel’lAI  = BellAI  if  the  cardinality  of  A is  less  than  or  equal  to  k.  The  third  part  follows 
from 


Bel’lAl  = IB  contained  m A m*lBl 

— ■‘"B  contained  in  A.  8i  £ k 
^ ZB  contained  in  A 


= BellAI. 


16.41 


Recall  Bell©!  = 1 is  one  of  the  requirements  of  an  Bell ) function.  For  the  Final  part  of 
the  proof  it  is  required  to  show  that  Befl©)  = 1 (This  actually  follows  automatically  since 
oi*l  ! satisfies  the  conditions  of  an  rrefunction.l. 

Bel’l©)  = IB  contained  ,n  e oi*(BI 

= ZB  contained  in  9.  IBI  <;  k + 

ZB  contained  in  9.  IBI  £ k + ZB  contained  in  9.  | Bl  > k 

~ ZB  contained  in  9 

= Bell©!.  <6.5! 

Hence  the  condition  on  the  Bel*1 I has  been  satisfied.  The  condition  on  the  PI ’ I • I now 
follows  immediately. 


Pl’lA!  = 1 - Bel’l  A)  £ 1-  BellAI  = PlIA). 


16.6) 


It  has  now  been  shown  that  this  form  of  approximation  gives  the  desired  effect  of 


widening  the  interval  between  the  belief  and  the  plausability.  The  computational  saving  is 
made  because  cf  all  the  zeros  used  to  replace  the  original  assessments  for  sets  with 
cardinality  greater  than  k.  Clearly  these  sets  can  now  be  ignored  when  performing  a 
combination.  Tins  form  of  approximation  could  prove  very  useful  in  large  systems,  however, 
there  is  a danger  that  the  approximation  may  not  be  very  good.  The  best  results  will 
undoubtedly  come  when  small  basic  probability  numbers  are  assigned  to  sets  with  high 
cardinality.  It  may  prove  to  be  a worthwhile  exercise  to  increment  the  value  of  k on 
successive  iterations  until  two  successive  iterations  yield  close  results.  This  sort  of 
numerical  exercise  is  a task  for  the  future. 

7.  AN  INNER  APPROXIMATION 

In  a similar  manner  to  Section  6 the  opposite  effect  of  narrowing  the  interval  between 
the  belief  and  the  plausability  can  be  achieved.  Denoting  these  approximations  by  mj 
Bel.l  ) and  Pl.(|,  the  desired  results  are  now  as  follows: 

Bel.lAI  i Bel(A)  . A contained  in  9 

PI. I A)  ^ P!(AI  , A contained  in  9.  17.11 

Again  the  function  m.l'l  must  not  violate  the  rules  for  an  function.  It  is  convenient  to 
set  up  an  intermediary  function  for  ease  of  presentation: 

Mk[A)  = IB  in  a,  |B|  £ k m<B,■  ('-21 

We  define  m,  to  be  an  order  k inner  approximation  to  m as  follows: 

m,  if)  = 0 

m«IA)  = 0 if  |a|  > k.  A contained  in  9 

= miA)  + IA  contgji)4d  l„  D.  ,D|  > k * ,T,(D)  1 °therw,se.  <7-3» 

Once  again  the  first  requirement  of  an  nt-function  is  trivially  satisfied.  As  all  the 
component  parts  of  Equation  7.3  are  non-negative  it  is  sufficient  to  verify  the  third  clause 
of  an  ^function.  That  is,  we  must  verify  that  the  component  parts  of  m.(  I sum  to  one. 


'wm ~ * * yh-fifl  w p yFij*  wr  TT  fj"  *jrj\nj,rjv~  .*'7 


."T*"lATl.^rVV_V’V  .*.'  \“rf  L”*  .“V  . 
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I/rulA)  = ZtA|  s k I^Al  + Z!D|  > k.  A contained  ,n  D "*A>  X ^Dl  AVD,) 


* Z1AI  i krtAI  + ZIDI  > k "dD)  X ^k10*  1 WklD)l 


Z!A!  <;  k ""A*  + Z!Dl  > k ^Dl 


ZA  contained  in  0 ^A) 


17.41 


The  conditions  for  an  rrhfunction  are  thus  satisfied.  The  range  of  possible  values  for  k is 


1 £ k £ n-1. 


The  value  k=l  corresponds  to  approximating  the  belief  function  by  a probability  distribution 
and  the  value  k = n-l  yields  the  original  probability  assignment. 


It  is  now  necessary  to  prove  the  assertions  made  in  Equation  7.1.  The  proof  for  the 
Pl.l  I function  part  will  follow  in  a similar  fashion  to  that  for  the  Pi’l  l function  above.  But 
it  is  necessary  to  prove  the  belief  part  first.  Clearly  Bel.lfl  * 0,  and  Bel.101  = 1 las 
in.l  l satisfies  the  conditions  for  an  n*- function. I.  Now  it  is  necessary  to  prove  the  assertion 
in  the  cases  where  for  any  subset  A of  0.  IaI  is  either  greater  than  k or  less  than  or 
equal  to  k.  In  the  latter  case  the  following  relationships  hold: 


Bel.lAI  = Z 


E contained  in  A 


ZE  contained  in  A.  El  £ k 


ZE  contained  in  A.  IEI  £ k ZIDI  > k.  E contained  in  D X miDl  Af^lDll 


= Zr  . . . , pi  . , miE)  + c Isavl. 

E contained  in  A.  IEI  £ k 


17.51 


But  since  the  cardinality  of  A is  assumed  to  be  less  than  or  equal  to  k.  then  the 
cardinality  of  E is  already  determined,  such  that 


ZE  contained  in  A.  |E|  £ k ZE  contained  in  A 


= BellAi. 


'7  61 


Hence  Bel, (A)  £ BellA)  for  the  case  where  Ai  £ k.  Now  a proof  for  the  other  case  I IA I 
> kl  is  needed.  Equation  7.5  still  holds  and  serves  as  the  starting  point  here. 


V \*  V V / / 

% ■ > 4 - 


’ V"  ‘ ’JfJ 

t 


as 

" - 


V 


w_n  *■ — 
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Bel.lA)  - IE  conlwn-  in  a.  ,E,  <;  k ^E)  + c 

= B*i<AI  ' ZE  contained  In  A.  |E|  > k '*E)  + C- 
If  the  terms  in  the  constant  c are  expanded  and  collected  in 
becomes  apparent  that  c contains  the  summation  over  |E|  > k. 

C = Zr-  . r . + C,  Isavl. 

E contained  in  A.  El  > k 1 

Hence  the  conditions  are  satisfied  as  now  it  is  clear  that 


17  7) 


a suitably  different  way  it 
That  is 

(7.81 


Bel.lAI  =■  BeKAI  + c,. 


17.91 


The  conditions  on  the  plausabilitv  function  now  follow  immediately 


P1.IAI  =*  1 • Bel.lAI  £ 1 - BeKAI  = PlIAI.  17.101 

It  has  now  been  shown  that  this  form  of  approximation  gives  the  desired  effect  of 
narrowing  the  interval  between  the  belief  and  the  p la  usability.  A slightly  better 

computational  saving  is  achieved  with  this  inner  approximation  than  with  the  outer 
approximation  because  one  additional  value  of  m is  known  to  be  zero.  The  effects  of  the 
approximations  are  summarized  in  Table  7-1. 


The  terms  'increased’  and  decreased’  in  Table  7-1  should  not  be  interpreted  strictly:  that 
is.  they  include  the  possibility  of  no  change, 


Both  of  these  approximations  set  basically  the  same  elements  to  zero,  for  a given  value 
of  k.  to  achieve  a computational  saving  (the  one  exception  is  9.1  It  may  be  possible  to 

combine  the  two  approaches.  As  one  approximation  achieves  a wider  interval  and  the  other 
achieves  the  opposite  effect  it  should  be  possible  to  find  some  optimal  combination  of  the 
two  approximations.  There  are  obviously  many  possible  measures  to  optimize.  A 
particularly  simple  one  is  to  choose  the  proportionality  constant  /?  to  minimize 


T 

^—A  x 9 


p[fi Bel.lAI  - BeKAI |. 


If  ^Ixl  = x2  the  solution  is 
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Table  7-1:  Summary  of  the  Approximations 

BelC) 


Cardinali ty 

Approximation 

BelV) 

Bel*C) 

> k 

fixed 

increased 

£ k 

decreased 

increased 

Cardinality 

Approximation 

piY) 

P1*C) 

> k 

increased 

decreased 

£ k 

fixed 

decreased 

P.  = A x q Bel.(A)  BellA)  / 0 Bel.lAI  Bel.lAI. 

We  do  not  yet  have  any  numerical  experience  with  this  approximation  and  we  are 
examining  other  measures  of  distance. 

We  note  that  one  of  the  primary  motivations  for  the  use  of  belief  functions  is  the 
uncertainty  attached  to  the  probability  assessments  of  the  expert  and  the  user.  An  order  1 
inner  approximation  to  a belief  function  is  a probability  distribution.  An  interesting 
question  occurs:  Is  there  any  sense  in  which  the  order  1 inner  approximation  is  an  optimal 
approximation  (estimate?)  of  the  uncertain  probability  distribution  which  is  represented  by 
the  belief  function? 


•*/«*  V V v y ' * * ■"*  •'*  •**  y*  • - 


8.  COMPUTER  IMPLEMENTATION 


We  have  developed  a number  of  computer  programs  to  use  belief  functions  with  rule-based 
systems.  The  following  material  discusses  the  algorithm  to  be  followed  when  using  forward 
chaining  The  steps  for  backward  chaining  are  similar. 

The  basic  mechanism  for  propogating  beliefs  through  the  system  are  extension  of  the 
belief  function  to  a refined  frame  and  combination  with  another  belief  function.  The  user  of 
the  system  is  asked  to  provide  evidence  in  the  form  of  a belief  function.  If  there  is  a 
match  of  the  preconditions,  then  a rule  will  fire  (become  instantiated).  Note  that  all  of  the 
preconditions  for  a rule  must  be  matched  before  a rule  will  actually  fire.  Therefore,  a user 
may  be  asked  to  input  a number  of  beliefs  before  a rule  does  fire. 

When  a rule  fires  the  current  frame  is  refined  and  the  current  belief  is  extended  to  the 
rest  of  the  elements.  The  extension  of  the  current  belief  is  combined  with  the  extension  of 
the  expert-supplied  belief  attached  to  the  rule.  This  process  is  then  be  repeated  until  a 
desired  goal  is  reached. 

An  expert  will  have  previously  supplied  his  beliefs  concerning  each  of  these  rules  and 
these  beliefs  will  be  attached  to  the  rules.  Rules  may  have  a number  of  precondition  clauses 
but  must  only  have  one  resultant  clause.  If  a possible  rule  has  a disjunctive  precondition 
the  rule  is  split  into  two  or  more  rules  with  single  (or  possibly  conjunctive)  preconditions 
and  the  same  resultant  clause.  If  a possible  rule  has  a conjunctive  '-esultant  clause  the  rule 
is  split  into  two  or  more  rules  w th  a single  (or  possible  disjunctive)  result  and  the  same 
preconditions.  Note  that  this  structure  implies  that  the  underlying  graph  is  a Chow  tree 
(Chow  and  Liu,  1968).  A Chow  tree  is  a directed  (and  connected)  graph  with  the  property 
that  there  are  no  cycles  in  the  corresponding  undirected  graph. 

Each  rule  base  requires  that  an  expert  supply  belief  functions  for  each  of  the  rules.  These 
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expert-supplied  beliefs  attached  to  the  rules,  in  most  cases,  will  not  change  from  one  use  of 
the  system  to  the  next. 

The  system  allows  both  forward  and  backward  chaining  In  a typical  chaining  program 
without  belief  functions,  when  a user  supplies  the  fact(s)  for  a rule,  the  rule  will  fire  and  a 
conclusion  will  be  reached  with  certainty.  In  this  system,  the  user  supplies  evidence  in  the 
form  of  a belief  function.  The  expert-suppiied  belief  function  for  that  rule  is  retrieved.  All  of 
the  precondition  clauses  of  the  rule  must  be  checked  because  they  too.  may  have  attached 
belief  functions.  This  is  because  previous  rule  instantiations  may  have  created  a belief 
function  for  these  if  clauses  Also,  an  if  clause  may  have  a belief  function  attached  to  it 
from  a previous  use  as  an  evidence  node. 

From  a computer  programming  standpoint  this  means  that  many  belief  functions  must  be 
created  and  stored  and  additional  checking  must  be  performed  to  determine  if  these  belief 
functions  are  to  be  used  with  the  current  rule.  This  is  mainly  determined  by  looking  at  the 
active-set  for  each  belief  function.  The  active-set  is  a list  of  the  propositions  that  a belief 
function  pertains  to.  When  compared,  rules  may  have  some  of  the  same  members  of  the 
active-set  list,  but  no  *~,*'o  rules  should  have  exactly  the  same  members.  The  procedure 

that  takes  two  belief  functions  and  defines  them  on  a compatible  frame  of  discernment  is 
called  refinement. 

After  all  of  the  belief  functions  associated  with  a lule  firing  have  been  combined  into  one 
overall  belief  function,  control  is  returned  to  the  chaining  program.  The  resulting  belief 
function  is  stored  for  further  use  and  is  output  to  the  user  along  with  the  conclusion 
(result  of  the  instantiated  rule).  The  user  can  then  begin  this  process  again  by  introducing 


more  new  evidence. 
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